DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech
نویسندگان
چکیده
We propose a number of enhancement techniques to improve speech quality in bandwidth expansion (BWE) from narrowband to wideband speech, addressing three issues, which could be critical in real-world applications, namely: (1) discontinuity between narrowband spectrum and the estimated high frequency spectrum, (2) energy mismatch between testing and training utterances, and (3) expanding bandwidth of out-ofdomain speech signals. With an inherent prediction of missing high frequency features in bandwidth-expanded speech we also explore the feasibility of adding these estimated features to those extracted from narrowband speech in order to improve the system performance for automatic speech recognition (ASR) of narrowband speech. Leveraging upon a recently-proposed deep neural network based speech BWE system intended for hearing quality enhancement these techniques not only improve over the traditionally-adopted objective and subjective measures but also reduce the word error rate (WER) from 8.67% when recognizing narrowband speech to 8.26% when recognizing bandwidthexpanded speech, and almost approaching the WER of 8.12% when recognizing wideband speech in the 20,000-word openvocabulary Wall Street Journal ASR task.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملStatistical methods for incomplete speech data
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Ulpu Remes Name of the doctoral dissertation Statistical methods for incomplete speech data Publisher School of Electrical Engineering Unit Department of Signal Processing and Acoustics Series Aalto University publication series DOCTORAL DISSERTATIONS 149/2016 Field of research Speech and Language Technology Manuscript submitt...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کامل